Use of Bayesian Network in Information Extraction from Unstructured Data Sources
نویسنده
چکیده
This paper applies Bayesian Networks to support information extraction from unstructured, ungrammatical, and incoherent data sources for semantic annotation. A tool has been developed that combines ontologies, machine learning, and information extraction and probabilistic reasoning techniques to support the extraction process. Data acquisition is performed with the aid of knowledge specified in the form of ontology. Due to the variable size of information available on different data sources, it is often the case that the extracted data contains missing values for certain variables of interest. It is desirable in such situations to predict the missing values. The methodology, presented in this paper, first learns a Bayesian network from the training data and then uses it to predict missing data and to resolve conflicts. Experiments have been conducted to analyze the performance of the presented methodology. The results look promising as the methodology achieves high degree of precision and recall for information extraction and reasonably good accuracy for predicting missing values. Keywords—Information Extraction, Bayesian Network, ontology, Machine Learning
منابع مشابه
A Note on Evolutionary Rate Estimation in Bayesian Evolutionary Analysis: Focus on Pathogens
Bayesian evolutionary analysis provide a statistically sound and flexible framework for estimation of evolutionary parameters. In this method, posterior estimates of evolutionary rate (μ) are derived by combining evolutionary information in the data with researcher’s prior knowledge about the true value of μ. Nucleotide sequence samples of fast evolving pathogens that are taken at d...
متن کاملA Reference-set Approach to Information Extraction from Unstructured, Ungrammatical Data Sources
This thesis investigates information extraction from unstructured, ungrammatical text on the Web such as classified ads, auction listings, and forum postings. Since the data is unstructured and ungrammatical, this information extraction precludes the use of rule-based methods that rely on consistent structures within the text or natural language processing techniques that rely on grammar. Inste...
متن کاملA Comparison of Two Ontology-Based Semantic Annotation Frameworks
The paper compares two semantic annotation frameworks that are designed for unstructured and ungrammatical domains. Both frameworks, namely ontoX (ontology-driven information Extraction) and BNOSA (Bayesian network and ontology based semantic annotation), extensively use ontologies during knowledge building, rule generation and data extraction phases. Both of them claim to be scalable as they a...
متن کاملAn Integrative Approach to Information Extraction
Huge amount of information is hidden within unstructured text. This information is often best exploited in structured or relational form, which is suited for many applications including Information Extraction. Information Extraction is the task of automatically extracting structured information from a given set of information thus producing a well-defined categorized data from unstructured mach...
متن کاملA New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model
Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...
متن کامل